Segway: simultaneous segmentation of multiple functional genomics data sets with heterogeneous patterns of missing data
نویسندگان
چکیده
New functional genomics methods enabled by high-throughput DNA sequencing have begun to produce an unprecedented amount of data anchored to the genome of humans and other species. We have developed a method to identify joint patterns in the results of multiple classes of functional genomics experiments. The method partitions the genome into variable-length segments using a dynamic Bayesian network where the dynamic (or “time”) axis represents genomic position. Segments are assigned one of a finite number of labels such that the vectors of observations are similar in segments with the same label. A multinet switching structure allows inference on sequences with combinations of missing data in different tracks that vary at each position, without downsampling or interpolation. This permits us to take full advantage of the high-resolution data generated by sequencing assays, working at up to 1-base-pair resolution. Our system can also incorporate other kinds of data into its classification, including lower-resolution continuous data such as microarray data, or discrete data such as the dinucleotide sequence beginning at each position. We demonstrate the use of the method in both unsupervised and semisupervised training of segment parameters.
منابع مشابه
Selection of Variables that Influence Drug Injection in Prison: Comparison of Methods with Multiple Imputed Data Sets
Background: Prisoners, compared to the general population, are at greater risk of infection. Drug injection is the main route of HIV transmission, in particular in Iran. What would be of interest is to determine variables that govern drug injection among prisoners. However, one of the issues that challenge model building is incomplete national data sets. In this paper, we addressed the process ...
متن کاملCurve Evolution, Boundary-Value Stochastic Processes, the Mumford-Shah Problem, and Missing Data Applications
We present an estimation-theoretic approach to curve evolution for the Mumford-Shah problem. By viewing an active contour as the set of discontinuities in the Mumford-Shah problem, we may use the corresponding functional to determine gradient descent evolution equations to deform the active contour. In each gradient descent step, we solve a corresponding optimal estimation problem, connecting t...
متن کاملProbabilistic Linkage of Persian Record with Missing Data
Extended Abstract. When the comprehensive information about a topic is scattered among two or more data sets, using only one of those data sets would lead to information loss available in other data sets. Hence, it is necessary to integrate scattered information to a comprehensive unique data set. On the other hand, sometimes we are interested in recognition of duplications in a data set. The i...
متن کاملA committee machine approach for predicting permeability from well log data: a case study from a heterogeneous carbonate reservoir, Balal oil Field, Persian Gulf
Permeability prediction problem has been examined using several methods such as empirical formulas, regression analysis and intelligent systems especially neural networks and fuzzy logic. This study proposes an improved and novel model for predicting permeability from conventional well log data. The methodology is integration of empirical formulas, multiple regression and neuro-fuzzy in a commi...
متن کاملInvestigating the missing data effect on credit scoring rule based models: The case of an Iranian bank
Credit risk management is a process in which banks estimate probability of default (PD) for each loan applicant. Data sets of previous loan applicants are built by gathering their data, and these internal data sets are usually completed using external credit bureau’s data and finally used for estimating PD in banks. There is also a continuous interest for bank to use rule based classifiers to b...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2009